Subspace Mapping of Noisy Text Documents

نویسندگان

  • Axel J. Soto
  • Marc Strickert
  • Gustavo E. Vazquez
  • Evangelos E. Milios
چکیده

Subspace mapping methods aim at projecting high-dimensional data into a subspace where a specific objective function is optimized. Such dimension reduction allows the removal of collinear and irrelevant variables for creating informative visualizations and task-related data spaces. These specific and generally de-noised subspaces spaces enable machine learning methods to work more efficiently. We present a new and general subspace mapping method, Correlative Matrix Mapping (CMM), and evaluate its abilities for category-driven text organization by assessing neighborhood preservation, class coherence, and classification. This approach is evaluated for the challenging task of processing short and noisy documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Noise Reduction Method Based on Subspace Division

This article presents a new subspace-based technique for reducing the noise of signals in time-series. In the proposed approach, the signal is initially represented as a data matrix. Then using Singular Value Decomposition (SVD), noisy data matrix is divided into signal subspace and noise subspace. In this subspace division, each derivative of the singular values with respect to rank order is u...

متن کامل

Speech Enhancement Through an Optimized Subspace Division Technique

The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...

متن کامل

Local Semantic Kernels for Text Document Clustering

Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. Subspace clustering is an extension of traditional clustering ...

متن کامل

A Novel Noise Reduction Method Based on Subspace Division

This article presents a new subspace-based technique for reducing the noise of signals in time-series. In the proposed approach, the signal is initially represented as a data matrix. Then using Singular Value Decomposition (SVD), noisy data matrix is divided into signal subspace and noise subspace. In this subspace division, each derivative of the singular values with respect to rank order is u...

متن کامل

Self-Organization of Distributed Document Archives

Document archives may be regarded as a perfect application arena for unsupervised neural networks because many operations computers have to perform on text documents are classiication tasks based on noisy patterns. The \noise" originates from the known inaccuracy of mapping free-form natural language to an indexing vocabulary representing the contents of the documents. In this paper we describe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011